Search CORE

460 research outputs found

Reducing speech recognition time and memory use by means of compound (de-)composition

Author: Martens Jean-Pierre
Réveil Bert
Publication venue: STW Technology Foundation
Publication date: 01/01/2008
Field of study

This paper tackles the problem of Out Of Vocabulary words in Automatic Speech Transcription applications for a compound language (Dutch). A seemingly attractive way to reduce the amount of OOV words in compound languages is to extend the AST system with a compound (de-)composition module. However, thus far, successful implementations of this approach are rather scarce. We developed a novel data driven compound (de-)composition module and tested it in two different AST experiments. For equal lexicon sizes, we see that our compound processor lowers the OOV rate. Moreover we are able to transform that gain in OOV rate into a reduction of the Word Error Rate of the transcription system. Using our approach we built a system with an 84K lexicon that performs as accurately as a baseline system with a 168K lexicon, but our system is 5-6% faster and requires about 50% less storage for the lexical component, even though this component is encoded in an optimal way (prefix-suffix tree compression)

Ghent University Academic Bibliography

Integrating musicological knowledge into a probabilistic framework for chord and key extraction

Author: Martens Jean-Pierre
Pauwels Johan
Publication venue: Audio Engineering Society (AES)
Publication date: 01/01/2010
Field of study

In this contribution a formerly developed probabilistic framework for the simultaneous detection of chords and keys in polyphonic audio is further extended and validated. The system behaviour is controlled by a small set of carefully defined free parameters. This has permitted us to conduct an experimental study which sheds a new light on the importance of musicological knowledge in the context of chord extraction. Some of the obtained results are at least surprising and, to our knowledge, never reported as such before

Ghent University Academic Bibliography

Combining joint factor analysis and iVectors for robust language recognition

Author: Demuynck Kris
Desplanques Brecht
Martens Jean-Pierre
Publication venue
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography

How speaker tongue and name source language affect the automatic recognition of spoken names

Author: D'Hoore Bart
Martens Jean-Pierre
Réveil Bert
Publication venue: International Speech Communication Association (ISCA)
Publication date: 01/01/2009
Field of study

In this paper the automatic recognition of person names and geographical names uttered by native and non-native speakers is examined in an experimental set-up. The major aim was to raise our understanding of how well and under which circumstances previously proposed methods of multilingual pronunciation modeling and multilingual acoustic modeling contribute to a better name recognition in a cross-lingual context. To come to a meaningful interpretation of results we have categorized each language according to the amount of exposure a native speaker is expected to have had to this language. After having interpreted our results we have also tried to find an answer to the question of how much further improvement one might be able to attain with a more advanced pronunciation modeling technique which we plan to develop

Ghent University Academic Bibliography

Improving large vocabulary continuous speech recognition by combining GMM-based and reservoir-based acoustic modeling

Author: Demuynck Kris
Martens Jean-Pierre
Triefenbach Fabian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

In earlier work we have shown that good phoneme recognition is possible with a so-called reservoir, a special type of recurrent neural network. In this paper, different architectures based on Reservoir Computing (RC) for large vocabulary continuous speech recognition are investigated. Besides experiments with HMM hybrids, it is shown that a RC-HMM tandem can achieve the same recognition accuracy as a classical HMM, which is a promising result for such a fairly new paradigm. It is also demonstrated that a state-level combination of the scores of the tandem and the baseline HMM leads to a significant improvement over the baseline. A word error rate reduction of the order of 20\% relative is possible

Crossref

Ghent University Academic Bibliography

Robust language recognition via adaptive language factor extraction

Author: Demuynck Kris
Desplanques Brecht
Martens Jean-Pierre
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents a technique to adapt an acoustically based language classifier to the background conditions and speaker accents. This adaptation improves language classification on a broad spectrum of TV broadcasts. The core of the system consists of an iVector-based setup in which language and channel variabilities are modeled separately. The subsequent language classifier (the backend) operates on the language factors, i.e. those features in the extracted iVectors that explain the observed language variability. The proposed technique adapts the language variability model to the background conditions and to the speaker accents present in the audio. The effect of the adaptation is evaluated on a 28 hours corpus composed of documentaries and monolingual as well as multilingual broadcast news shows. Consistent improvements in the automatic identification of Flemish (Belgian Dutch), English and French are demonstrated for all broadcast types

Ghent University Academic Bibliography

Modeling musicological information as trigrams in a system for simultaneous chord and local key extraction

Author: Leman Marc
Martens Jean-Pierre
Pauwels Johan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

In this paper, we discuss the introduction of a trigram musicological model in a simultaneous chord and local key extraction system. By enlarging the context of the musicological model, we hoped to achieve a higher accuracy that could justify the associated higher complexity and computational load of the search for the optimal solution. Experiments on multiple data sets have demonstrated that the trigram model has indeed a larger predictive power (a lower perplexity). This raised predictive power resulted in an improvement in the key extraction capabilities, but no improvement in chord extraction when compared to a system with a bigram musicological model

Crossref

Ghent University Academic Bibliography

Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment

Author: Bocklet Tobias
Martens Jean-Pierre
Middag Catherine
Nöth Elmar
Publication venue: International Speech Communication Association (ISCA)
Publication date: 01/01/2011
Field of study

Intelligibility is widely used to measure the severity of articulatory problems in pathological speech. Recently, a number of automatic intelligibility assessment tools have been developed. Most of them use automatic speech recognizers (ASR) to compare the patient's utterance with the target text. These methods are bound to one language and tend to be less accurate when speakers hesitate or make reading errors. To circumvent these problems, two different ASR-free methods were developed over the last few years, only making use of the acoustic or phonological properties of the utterance. In this paper, we demonstrate that these ASR-free techniques are also able to predict intelligibility in other languages. Moreover, they show to be complementary, resulting in even better intelligibility predictions when both methods are combined

Ghent University Academic Bibliography

Methodological considerations concerning manual annotation of musical audio in function of algorithm development

Author: De Baets Bernard
Leman Marc
Lesaffre Micheline
Martens Jean-Pierre
Publication venue
Publication date: 01/01/2004
Field of study

In research on musical audio-mining, annotated music databases are needed which allow the development of computational tools that extract from the musical audiostream the kind of high-level content that users can deal with in Music Information Retrieval (MIR) contexts. The notion of musical content, and therefore the notion of annotation, is ill-defined, however, both in the syntactic and semantic sense. As a consequence, annotation has been approached from a variety of perspectives (but mainly linguistic-symbolic oriented), and a general methodology is lacking. This paper is a step towards the definition of a general framework for manual annotation of musical audio in function of a computational approach to musical audio-mining that is based on algorithms that learn from annotated data. 1

CiteSeerX

Ghent University Academic Bibliography

Recognition of foreign names spoken by native speakers

Author: Martens Jean-Pierre
Stouten Frederik
Publication venue: International Speech Communication Association (ISCA)
Publication date: 01/01/2007
Field of study

It is a challenge to develop a speech recognizer that can handle the kind of lexicons encountered in an automatic attendant or car navigation application. Such lexicons can contain several 100K entries, mainly proper names. Many of these names are of a foreign origin, and native speakers can pronounce them in different ways, ranging from a completely nativized to a completely foreignized pronunciation. In this paper we propose a method that tries to deal with the observed pronunciation variability by introducing the concept of a foreignizable phoneme, and by combining standard acoustic models with a phonologically inspired back-off acoustic model. The main advantage of the approach is that it does not require any foreign phoneme models nor foreign speech data. For the recognition of English names by means of Dutch acoustic models, we obtained a reduction of the word error rate by more than 10% relative

Ghent University Academic Bibliography